Goto

Collaborating Authors

 discrete gradient dynamic


Implicit Regularization of Discrete Gradient Dynamics in Linear Neural Networks

Neural Information Processing Systems

When optimizing over-parameterized models, such as deep neural networks, a large set of parameters can achieve zero training error. In such cases, the choice of the optimization algorithm and its respective hyper-parameters introduces biases that will lead to convergence to specific minimizers of the objective. Consequently, this choice can be considered as an implicit regularization for the training of over-parametrized models. In this work, we push this idea further by studying the discrete gradient dynamics of the training of a two-layer linear network with the least-squares loss. Using a time rescaling, we show that, with a vanishing initialization and a small enough step size, this dynamics sequentially learns the solutions of a reduced-rank regression with a gradually increasing rank.


Reviews: Implicit Regularization of Discrete Gradient Dynamics in Linear Neural Networks

Neural Information Processing Systems

The paper addresses an important topic (implicit regularization in deep learning), is well-written, and although I did not verify proofs in the appendixes, I believe it is mathematically solid. Nonetheless, I have several concerns with regards to originality, significance and clarity (see below), which ultimately lead me to vote against its acceptance. This is true for the proof techniques as well. The only part I view as potentially novel is the perturbation analysis, but that is unfortunately not discussed at all in the body of the paper. I recommend to the authors to put much more focus on this aspect, as without it the paper is merely a straightforward extension of prior work.


Reviews: Implicit Regularization of Discrete Gradient Dynamics in Linear Neural Networks

Neural Information Processing Systems

The paper studies the dynamics of discrete gradient descent for overparametrized two-layer neural networks and shows that under certain conditions on the input/output covariance matrices and the initialization the components of the input-output map are learned sequentially. The reviewers appreciated the contributions of the paper, both theory and experiments, and found the paper well written. At the same time, one reviewer feels the assumptions are too strong, and another one feels that some claims are misleading (e.g. Post rebuttal, the reviewer concluded that the novelty of the paper is buried in the appendix, and that a re-write of the paper is needed to elucidate that novelty in the body of the paper. This AC agrees with R4 that the contributions relative to Lampinen and Ganguli need to be clearly established in the body of the paper and that a citation needs to be added.


Implicit Regularization of Discrete Gradient Dynamics in Linear Neural Networks

Neural Information Processing Systems

When optimizing over-parameterized models, such as deep neural networks, a large set of parameters can achieve zero training error. In such cases, the choice of the optimization algorithm and its respective hyper-parameters introduces biases that will lead to convergence to specific minimizers of the objective. Consequently, this choice can be considered as an implicit regularization for the training of over-parametrized models. In this work, we push this idea further by studying the discrete gradient dynamics of the training of a two-layer linear network with the least-squares loss. Using a time rescaling, we show that, with a vanishing initialization and a small enough step size, this dynamics sequentially learns the solutions of a reduced-rank regression with a gradually increasing rank.


A Dynamics Theory of Implicit Regularization in Deep Low-Rank Matrix Factorization

Cao, Jian, Qian, Chen, Huang, Yihui, Chen, Dicheng, Gao, Yuncheng, Dong, Jiyang, Guo, Di, Qu, Xiaobo

arXiv.org Artificial Intelligence

Implicit regularization is an important way to interpret neural networks. Recent theory starts to explain implicit regularization with the model of deep matrix factorization (DMF) and analyze the trajectory of discrete gradient dynamics in the optimization process. These discrete gradient dynamics are relatively small but not infinitesimal, thus fitting well with the practical implementation of neural networks. Currently, discrete gradient dynamics analysis has been successfully applied to shallow networks but encounters the difficulty of complex computation for deep networks. In this work, we introduce another discrete gradient dynamics approach to explain implicit regularization, i.e. landscape analysis. It mainly focuses on gradient regions, such as saddle points and local minima. We theoretically establish the connection between saddle point escaping (SPE) stages and the matrix rank in DMF. We prove that, for a rank-R matrix reconstruction, DMF will converge to a second-order critical point after R stages of SPE. This conclusion is further experimentally verified on a low-rank matrix reconstruction problem. This work provides a new theory to analyze implicit regularization in deep learning.


Implicit Regularization of Discrete Gradient Dynamics in Linear Neural Networks

Gidel, Gauthier, Bach, Francis, Lacoste-Julien, Simon

Neural Information Processing Systems

When optimizing over-parameterized models, such as deep neural networks, a large set of parameters can achieve zero training error. In such cases, the choice of the optimization algorithm and its respective hyper-parameters introduces biases that will lead to convergence to specific minimizers of the objective. Consequently, this choice can be considered as an implicit regularization for the training of over-parametrized models. In this work, we push this idea further by studying the discrete gradient dynamics of the training of a two-layer linear network with the least-squares loss. Using a time rescaling, we show that, with a vanishing initialization and a small enough step size, this dynamics sequentially learns the solutions of a reduced-rank regression with a gradually increasing rank.